Skip to content

Support proper numpy integration for ~100x performance boost#259

Merged
VeaaC merged 6 commits intoheremaps:masterfrom
VeaaC:faster-py
Apr 23, 2026
Merged

Support proper numpy integration for ~100x performance boost#259
VeaaC merged 6 commits intoheremaps:masterfrom
VeaaC:faster-py

Conversation

@VeaaC
Copy link
Copy Markdown
Collaborator

@VeaaC VeaaC commented Apr 23, 2026

flatdata-py performance: vectorized access and scalar optimization

What

Adds NumPy-based vectorized field access to flatdata-py and optimizes the scalar (element-by-element) read path. Also fixes a pre-existing bug in read_value() for unaligned 64-bit fields.

Changes

Vectorized access (data_access.py, resources.py)

  • read_field_vectorized(): reads a bit-packed field from all vector elements at once via NumPy, returning an ndarray. Zero-copy over the mmap'd buffer.
  • Vector.__getattr__("field") returns a DataFrame column for the field.
  • Vector.to_numpy() / to_data_frame() return all fields at once.
  • _VectorSlice gets the same vectorized methods.
  • Results are cached per vector instance via _as_numpy_2d().

Pre-computed field readers (data_access.py, structure.py)

  • make_field_reader(offset, width, signed) builds a specialized closure with all constants (byte offset, bit shift, mask, sign handling) pre-computed. Six variants cover the cross-product of field types.
  • Structure.__init_subclass__ builds a _READERS dict once per class.
  • __getattr__, as_dict, as_list, as_tuple, as_nparray all use _READERS.
  • read_value() is preserved as a thin wrapper around make_field_reader for one-off reads.

Bug fix (data_access.py)

  • read_value() for 64-bit fields at non-byte-aligned offsets could return values wider than 64 bits (Python arbitrary-precision ints). The bit mask was only applied when num_bits < 64, missing the case where offset_extra_bits > 0. Fixed by masking when num_bits < 64 or offset_extra_bits > 0.

Other

  • __slots__ = () added to generated Structure subclasses (generator template + 10 golden files). Reduces instance size from 72 to 48 bytes.
  • Vector.__iter__ uses local variable caching to avoid repeated attribute lookups.
  • Removed unnecessary list() on dict keys in Archive.__getattr__.
  • Performance tips section added to flatdata-py/README.md.
  • Version bump: flatdata-generator and flatdata-py both 0.4.10 → 0.4.11.
  • CI workflow updated to install local generator before flatdata-py (py.yml).

Performance

Measured on a vector from a test archive (5.8M elements, 20 fields, 32 bytes each):

Access pattern Before After
Scalar iteration (1 field) 9.7s 5.8s
Vectorized column access (1 field) n/a 0.07s

VeaaC added 5 commits April 23, 2026 13:43
Signed-off-by: Christian Vetter <christian.vetter@here.com>
Signed-off-by: Christian Vetter <christian.vetter@here.com>
Signed-off-by: Christian Vetter <christian.vetter@here.com>
Signed-off-by: Christian Vetter <christian.vetter@here.com>
Signed-off-by: Christian Vetter <christian.vetter@here.com>
Signed-off-by: Christian Vetter <christian.vetter@here.com>
@VeaaC VeaaC merged commit 70b9050 into heremaps:master Apr 23, 2026
9 checks passed
@VeaaC VeaaC deleted the faster-py branch April 23, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant